Search CORE

17 research outputs found

Wheels within Wheels: Making Fault Management Cost-Effective

Author: Goldszmidt Moises
Malek Miroslaw
Nadjm-Tehrani Simin
Narasimhan Priya
Salfner Felix
Ward Paul A.S.
Wilkes John
Publication venue: Dagstuhl Seminar Proceedings. 09201 - Self-Healing and Self-Adaptive Systems
Publication date: 01/01/2009
Field of study

Local design and optimization of the components of a fault management system results in sub-optimal decisions. This means that the target system will likely not meet its objectives (under-performs) or cost too much if conditions, objectives, or constraints change. We can fix this by applying a nested, management system for the fault-management system itself. We believe that doing so will produce a more resilient, self-aware, system that can operate more effectively across a wider range of conditions, and provide better behavior at closer to optimal cost. This document summarizes the results of the Working Group 7 - ``Cost-Effective Fault Management\u27\u27 - at the Dagstuhl Seminar 09201 ``Self-Healing and Self-Adaptive Systems\u27\u27 (organized by A. Andrzejak, K. Geihs, O. Shehory and J. Wilkes). The seminar was held from May 10th 2009 to May 15th 2009 in Schloss Dagstuhl~--~Leibniz Center for Informatics

Dagstuhl Research Online Publication Server

Modeling Event-driven Time Series with Generalized Hidden Semi-Markov Models

Author: Salfner Felix
Publication venue: Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II
Publication date: 01/01/2006
Field of study

This report introduces a new model for event-driven temporal sequence processing: Generalized Hidden Semi-Markov Models (GHSMMs). GHSMMs are an extension of hidden Markov models to continuous time that builds on turning the stochastic process of hidden state traversals into a semi-Markov process. A large variety of probability distributions can be used to specify transition durations. It is shown how GHSMMs can be used to address the principle problems of temporal sequence processing: sequence generation, sequence recognition and sequence prediction. Additionally, an algorithm is described how the parameters of GHSMMs can be determined from a set of training data: The Baum-Welch algorithm is extended by an embedded expectation-maximization algorithm. Under some conditions the procedure can be simplified to the estimation of distribution moments. A proof of convergence and a complexity assessment are provided.Not Reviewe

CiteSeerX

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Reliability Modeling of Proactive Fault Handling

Author: Malek Miroslaw
Salfner Felix
Publication venue: Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, Institut für Informatik
Publication date: 01/01/2006
Field of study

Research on dependable computing is undergoing a shift from traditional fault tolerance towards techniques that handle faults proactively. These techniques comprise two parts: (a) prediction of failures and (b) actions that are performed in case of an upcoming failure. This work provides the first reliability model that incorporates both correct and false predictions as well as both types of actions: failure prevention and recovery preparation. Closed form solutions to availability, reliability and hazard rate are provided

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Advanced Failure Prediction in Complex Software Systems

Author: Hoffmann Günther A.
Malek Miroslaw
Salfner Felix
Publication venue: Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, Institut für Informatik
Publication date: 17/08/2011
Field of study

The availability of software systems can be increased by preventive measures which are triggered by failure prediction mechanisms. In this paper we present and evaluate two non-parametric techniques which model and predict the occurrence of failures as a function of discrete and continuous measurements of system variables. We employ two modelling approaches: an extended Markov chain model and a function approximation technique utilising universal basis functions (UBF). The presented modelling methods are data driven rather than analytical and can handle large amounts of variables and data. Both modelling techniques have been applied to real data of a commercial telecommunication platform. The data includes event-based log files and time continuously measured system states. Results are presented in terms of precision, recall, F-Measure and cumulative cost. We compare our results to standard techniques such as linear ARMA models. Our findings suggest significantly improved forecasting performance compared to alternative approaches. By using the presented modelling techniques the software availability may be improved by an order of magnitude

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction

Author: Felix Salfner
Steffen Tschirpke
Publication venue
Publication date: 01/01/2008
Field of study

Error logs are a fruitful source of information both for diagnosis as well as for proactive fault handling – however elaborate data preparation is necessary to filter out valuable pieces of information. In addition to the usage of well-known techniques, we propose three algorithms: (a) assignment of error IDs to error messages based on Levenshtein’s edit distance, (b) a clustering approach to group similar error sequences, and (c) a statistical noise filtering algorithm. By experiments using data of a commercial telecommunication system we show that data preparation is an important step to achieve accurate error-based online failure prediction.

CiteSeerX

Predicting failures of computer systems: a case study for a telecommunication system

Author: Felix Salfner
Michael Schieschke
Miroslaw Malek
Publication venue
Publication date: 01/01/2006
Field of study

The goal of online failure prediction is to forecast imminent failures while the system is running. This paper compares Similar Events Prediction (SEP) with two other well-known techniques for online failure prediction: a straightforward method that is based on a reliability model and Dispersion Frame Technique (DFT). SEP is based on recognition of failure-prone patterns utilizing a semi-Markov chain in combination with clustering. We applied the approaches to real data of a commercial telecommunication system. Results are presented in terms of precision, recall, F-measure and accumulated runtime-cost. The results suggest a significantly improved forecasting performance. 1

CiteSeerX

Crossref